This invention relates generally to methods and apparatus for encoding languages for machine processing and more specifically to methods and apparatus for converting characters of the Chinese, Japanese and Korean languages, i.e., characters of a Chinese character-based language, into a numeric code, pictorially storing the characters according to the code, and accessing the characters for processing.
Coding Chinese characters for the purpose of efficient retrieval began about 300 years ago with Chinese Emperor Kang-Hsi, who directed his scholars to develop an indexing method for Chinese characters. This effort resulted in the development of the Radical Coding (Notation) Method, which is still widely used by publishers of dictionaries as to Chinese character-based languages. The primary drawback of this method is that it is a non-numerical coding system, that is, the characters are classified by forms, called radicals, common to a plurality of characters, and the total number of strokes of each character. Many individuals have attempted to convert the Radical Coding Method to a digitalized process, but, to date, only limited success has been reported.
At the turn of the 20th century a new coding method, known as the Four Corner Coding (Notation) Method, was developed by Mr. Wang, Yun-Wu. This coding method divides all the stroke forms used in drawing the characters into ten categories and assigns a numerical coding from 0 to 9 to each of the categories. Then, for each character, the stroke forms at each of the four corners and a sub-corner are encoded into individual numerical values depending on the stroke forms. Thus, a five digit code is generated for each character. This is the current alternative indexing method for characters used in Chinese dictionaries. However, this method does not uniquely encode each of the characters, so that many ambiguities exist in this coding scheme. An ambiguity, as used herein, exists when two or more characters are contained in the same numerical code number. Since the invention of the Four Corner Coding Method, many attempts have been made to improve the method.